SYCL Track Finding, main branch (2024.11.14.) #773

krasznaa · 2024-11-15T14:07:09Z

This is finally a CKF algorithm for SYCL. 🥳

Still some technical stuff to set up, I just wanted to open it already, so people would be aware. 😄 (I'll open at least one PR off of this first, which should go in before this one.)

stephenswat · 2024-11-15T14:22:50Z

device/common/include/traccc/finding/device/apply_interaction.hpp

@@ -36,7 +36,7 @@ struct apply_interaction_payload {
     * @brief View object to the vector of boolean-like integers describing
     * whether each parameter is live. Has the same size as \ref params_view
     */
-    vecmem::data::vector_view<const unsigned int> params_liveness_view;


Is there a reason for this change? CUDA doesn't natively support 8-bit loads so I'm a bit worried about the performance implications of this. Also is there a reason to use char and not unsigned char?

We use these values as bools. Using 32-bits where we only need 1, seems very silly.

Of course bool doesn't work. 😦 But our convention in the offline code is to use char when we need "boolean information", but bool can't be used.

Also, take this into account: https://github.com/acts-project/traccc/blob/main/device/cuda/src/finding/finding_algorithm.cu#L154

What do you think is actually getting set for this buffer with that operation? 😏 Because it's not 0x1 values in the unsigned int variables...

In any case, I can't see why we shouldn't go for this. Even if NVIDIA always copies at least 16 bits, right now we move 32 bits in all cases. Even though we only need 1. If some of the loads are next to each other, this could still win us a little bit.

What do you think is actually getting set for this buffer with that operation? 😏 Because it's not 0x1 values in the unsigned int variables...

It's setting a non-zero value; I don't see the problem?

In any case, I can't see why we shouldn't go for this. Even if NVIDIA always copies at least 16 bits, right now we move 32 bits in all cases. Even though we only need 1. If some of the loads are next to each other, this could still win us a little bit.

Think of what happens if four adjacent threads want to write their chars to global memory at the same time.

I seriously do not see why we need to change this to save what boils down to 240 kilobytes...

Logically the code worked. But you can't argue that we're not spending more time with memsetting these values, and using more global memory on it, then we need to. 🤔

stephenswat · 2024-11-15T14:25:04Z

device/cuda/include/traccc/cuda/utils/thread_id.hpp

+    unsigned int inline TRACCC_DEVICE getLocalThreadId() const {
        return threadIdx.x;
    }

-    std::size_t inline TRACCC_DEVICE getLocalThreadIdX() const {
+    unsigned int inline TRACCC_DEVICE getLocalThreadIdX() const {
        return threadIdx.x;
    }

-    std::size_t inline TRACCC_DEVICE getGlobalThreadId() const {
+    unsigned int inline TRACCC_DEVICE getGlobalThreadId() const {
        return threadIdx.x + blockIdx.x * blockDim.x;
    }

-    std::size_t inline TRACCC_DEVICE getGlobalThreadIdX() const {
+    unsigned int inline TRACCC_DEVICE getGlobalThreadIdX() const {
        return threadIdx.x + blockIdx.x * blockDim.x;
    }

-    std::size_t inline TRACCC_DEVICE getBlockIdX() const { return blockIdx.x; }
+    unsigned int inline TRACCC_DEVICE getBlockIdX() const { return blockIdx.x; }

-    std::size_t inline TRACCC_DEVICE getBlockDimX() const { return blockDim.x; }
+    unsigned int inline TRACCC_DEVICE getBlockDimX() const {
+        return blockDim.x;
+    }

-    std::size_t inline TRACCC_DEVICE getGridDimX() const { return gridDim.x; }
+    unsigned int inline TRACCC_DEVICE getGridDimX() const { return gridDim.x; }


? Why? 😕

These are all unsigned int-s natively. If we return them as std::size_t, I'll have to add a whole lot of static_cast-s in the kernels to make sure that we wouldn't use unsigned long-s where unsigned int is expected.

As it turns out, our CUDA build does not look for type conversions / narrowings. oneAPI does. So I got a lot of warnings from such things.

Okay, but why not just add the static_casts then? getGlobalThreadIdX() can return a number larger than an unsigned integer can store.

Also I am not sure why the OneAPI compiler even sees a type that is supposed to be used only in CUDA code. That's worrying. 😟

The kernels were/are receiving std::size_t values for the thread ID. And then they use those values in a lot of functions that expect unsigned int values. That's where I was getting the warnings from.

But as long as I'm modifying the signature of the traccc::device:: functions, I thought I'd also have a look at what's going on in the CUDA code... 🤔

krasznaa · 2024-11-15T14:39:55Z

device/cuda/include/traccc/cuda/utils/thread_id.hpp

        return threadIdx.x + blockIdx.x * blockDim.x;
    }

-    std::size_t inline TRACCC_DEVICE getGlobalThreadIdX() const {
+    unsigned int inline TRACCC_DEVICE getGlobalThreadIdX() const {


🤔 I think it would be fair to put in an assertion into this function that this value would not be larger than unsigned int. At the same time, notice that the calculation on line 30 works with unsigned int itself. If the result of that overflows, the std::size_t return type will not do anything to help. 🤔

krasznaa · 2024-11-16T08:02:23Z

Unfortunately the situation is that the AMD backend of oneAPI doesn't seem to work with the code. 😦 I now tried a number of incantations, but linking libtrraccc_sycl.so always fails in some way for that backend. 😦

Of course I've been developing the code with the NVIDIA backend. Where it compiles/links fine. (Note that there are no runtime tests in the code just yet.) With the Intel backend one needs to be a bit careful as well, but at least the build can succeed with that one. But with the AMD backend there seem to be some bigger issues. 😢

...
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-6b3473.out.lto.o:(typeinfo name for traccc::sycl::kernels::fill_prefix_sum)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-6b3473.out.lto.o:(typeinfo name for traccc::sycl::kernels::fill_prefix_sum)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-6b3473.out.lto.o:(typeinfo name for traccc::sycl::kernels::fill_prefix_sum)
[build] >>> referenced 45 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-1227a4.out.lto.o:(typeinfo name for traccc::sycl::kernels::estimate_track_params)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-1227a4.out.lto.o:(typeinfo name for traccc::sycl::kernels::estimate_track_params)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-1227a4.out.lto.o:(typeinfo name for traccc::sycl::kernels::estimate_track_params)
[build] >>> referenced 225 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b47285.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_grid_capacities)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b47285.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_grid_capacities)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b47285.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_grid_capacities)
[build] >>> referenced 189 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b9db0d.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b9db0d.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-b9db0d.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::telescope_metadata<detray::rectangle2D>, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced 189 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-7fca83.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-7fca83.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-7fca83.out.lto.o:(typeinfo name for vecmem::data::vector_buffer<traccc::spacepoint> traccc::sycl::details::silicon_pixel_spacepoint_formation<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>>(detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap>>::view_type const&, vecmem::data::vector_view<traccc::measurement const> const&, std::pmr::memory_resource&, vecmem::copy&, sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'(sycl::_V1::nd_item<1>))
[build] >>> referenced 189 more times
[build] llvm-foreach: 
[build] lld: error: undefined symbol: llvm.amdgcn.readfirstlane.i32
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-9efc36.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_doublets)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-9efc36.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_doublets)
[build] >>> referenced by /tmp/silicon_pixel_spacepoint_formation_algorithm-gfx803-29d251-9efc36.out.lto.o:(typeinfo name for traccc::sycl::kernels::count_doublets)
[build] >>> referenced 1269 more times
...

krasznaa · 2024-11-18T10:18:15Z

Some interesting stuff going on. 🤔 After fixing yet one more bug in the common track finding code, I can now get "this far":

[bash][Celeborn]:out > ./build/sycl-fp32/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (489 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
WARNING: No entries in volume finder

Detector check: OK
unknown file: Failure
C++ exception with description "Exceeded the number of registers available on the hardware.
        The number registers per work-group cannot exceed 65536 for this kernel on this device.
        The kernel uses 245 registers per work-item for a total of 512 work-items per work-group.
 -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)" thrown in the test body.

[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1, where GetParam() = ("toy_n_particles_10000", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, false) (5304 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
WARNING: No entries in volume finder

Detector check: OK
unknown file: Failure
C++ exception with description "Exceeded the number of registers available on the hardware.
        The number registers per work-group cannot exceed 65536 for this kernel on this device.
        The kernel uses 245 registers per work-item for a total of 512 work-items per work-group.
 -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)" thrown in the test body.

[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2, where GetParam() = ("toy_n_particles_10000_random_charge", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, true) (4559 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (10353 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (10354 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1, where GetParam() = ("toy_n_particles_10000", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, false)
[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2, where GetParam() = ("toy_n_particles_10000_random_charge", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, true)

 2 FAILED TESTS
[bash][Celeborn]:out >

Surprisingly, we run out of registers on this piece of code:

                oneapi::dpl::sort_by_key(policy, keys_device.begin(),
                                         keys_device.end(),
                                         param_ids_device.begin());

Which I'm trying to get some help with in: uxlfoundation/oneDPL#1936

Track finding is hard... 😦

krasznaa · 2024-11-20T20:35:48Z

🎆 The unit test now succeeds with the NVIDIA backend! 🎆

[bash][Legolas]:out > ./build/sycl-fp32/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /data/ssd-1tb/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (321 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (4181 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (4162 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (8665 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (8665 ms total)
[  PASSED  ] 3 tests.
[bash][Legolas]:out >

But the PR still probably needs to stay open until we switch to oneAPI 2025.0.0 in the CI, to make the Intel backended build work as well. 🤔

krasznaa · 2024-12-11T14:37:27Z

Let me catch people up. The SYCL code seems to be functional finally, now that I fixed the very-very silly mistake that I made while translating it from CUDA. I can now reliably run the CKF unit test on our NVIDIA and AMD GPUs.

[bash][pcadp04]:traccc > ./build-sycl-nvidia/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-nvidia/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: NVIDIA RTX A5000
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (895 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
Created test queue on device: NVIDIA RTX A5000
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (4840 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
Created test queue on device: NVIDIA RTX A5000
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (4821 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (10557 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (10557 ms total)
[  PASSED  ] 3 tests.
[bash][pcadp04]:traccc >

[bash][pcadp04]:traccc > ./build-sycl-amd/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-amd/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: AMD Radeon RX 6700 XT
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (816 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
Created test queue on device: AMD Radeon RX 6700 XT
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (4972 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
Created test queue on device: AMD Radeon RX 6700 XT
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (4927 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (10717 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (10717 ms total)
[  PASSED  ] 3 tests.
[bash][pcadp04]:traccc >

I can also run the unit test successfully on a CPU.

[bash][pcadp04]:traccc > ONEAPI_DEVICE_SELECTOR=opencl:cpu ./build-sycl-intel/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-intel/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: AMD EPYC 7413 24-Core Processor
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0 (1027 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
Created test queue on device: AMD EPYC 7413 24-Core Processor
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1 (5204 ms)
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2
Created test queue on device: AMD EPYC 7413 24-Core Processor
WARNING: No entries in volume finder

Detector check: OK
[       OK ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/2 (5110 ms)
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests (11342 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (11342 ms total)
[  PASSED  ] 3 tests.
[bash][pcadp04]:traccc >

But unfortunately the SYCL runtime is having a very hard time with JIT-ing the track finding code for our Intel GPU. 😦

[bash][pcadp04]:traccc > ONEAPI_DEVICE_SELECTOR=opencl:gpu ./build-sycl-intel/bin/traccc_test_sycl --gtest_filter="*Ckf*"
Running main() from /mnt/hdd1/krasznaa/projects/traccc/build-sycl-intel/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *Ckf*
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from SYCLCkfToyDetectorValidation/CkfToyDetectorTests
[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/0
Created test queue on device: Intel(R) Data Center GPU Flex 170
WARNING: No entries in volume finder

Detector check: OK

(It is just stuck at this point, spinning the CPU at full tilt.) This unfortunately was to be expected. 😦 Since last week, when I tried to ahead-of-time compile the code specifically for our Intel GPU, the build had to be shut down after 3 hours. (It was still trying to link libtraccc_sycl.so after such a long time.)

So we'll have to have some heart-to-heart with the Intel developers. But at least the other backends seem functional by now. 🥳

krasznaa · 2024-12-11T16:14:13Z

Just to show... After >1 hour of sitting on linking libtraccc_sycl.so, I see:

I'll keep the build running, let's see if it ever finishes... 🤔

The code doesn't do anything, and is not used by anything yet.

Made the "track liveness" buffers use char instead of unsigned int. Since they store boolean information, char is enough. And it also makes memset(...) do a more expected thing on the buffers. Updated device::make_barcode_sequence not to narrow std::size_t into unsigned int. (Which oneAPI doesn't like.)

Mainly to avoid type conversion / narrowing in the code, but also to simplify it slightly in some places.

Trying to avoid confusion at runtime about which kernel is which.

Just as a convenience method for seeing in the SYCL unit tests which exact device is used.

Taught traccc::sycl::test_queue how to figure out what sort of a queue it is. So that the CKF test could be skipped on OpenCL and Level-0 backends as long as those are still not working.

sonarqubecloud · 2025-01-07T10:37:29Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
7.6% Duplication on New Code

See analysis details on SonarQube Cloud

krasznaa · 2025-01-09T11:27:08Z

Replaced by #811.

krasznaa added the sycl Changes related to SYCL label Nov 15, 2024

stephenswat reviewed Nov 15, 2024

View reviewed changes

krasznaa commented Nov 15, 2024

View reviewed changes

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 0e3fddc to 2bacf76 Compare November 16, 2024 07:11

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from ed91591 to 665ea4a Compare November 20, 2024 20:33

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch 4 times, most recently from 89717e4 to 8504f7f Compare November 29, 2024 19:45

krasznaa mentioned this pull request Dec 2, 2024

Acts 38.0.0 Update, main branch (2024.12.02.) #789

Merged

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 8504f7f to 24241b9 Compare December 2, 2024 13:37

This was referenced Dec 2, 2024

Thrust Usage Update, main branch (2024.12.02.) #790

Merged

Atomic Reference Fixes, main branch (2024.12.02.) #791

Merged

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 24241b9 to f3de0c6 Compare December 3, 2024 08:21

krasznaa mentioned this pull request Dec 3, 2024

Externals Update, main branch (2024.12.03.) #792

Merged

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from f3de0c6 to 65ffad6 Compare December 3, 2024 12:15

krasznaa mentioned this pull request Dec 3, 2024

oneAPI 2025 / SYCL2020 Compatibility, main branch (2024.12.03.) #793

Merged

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from d5ee90c to 8f9b751 Compare December 6, 2024 12:49

krasznaa mentioned this pull request Dec 7, 2024

SYCL Tweaks, main branch (2024.12.07.) #799

Merged

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch 3 times, most recently from 032f844 to 6486a22 Compare December 11, 2024 13:21

krasznaa added 2 commits January 7, 2025 10:47

Add a skeleton for the SYCL track finding algorithm.

fad2304

The code doesn't do anything, and is not used by anything yet.

krasznaa added 10 commits January 7, 2025 10:47

Further tweaks necessitated by oneAPI.

1290c0f

Mainly to avoid type conversion / narrowing in the code, but also to simplify it slightly in some places.

Add a first version of SYCL track finding.

7965296

Added a not yet functional test for the SYCL CKF.

a6a23fe

Make sure that all SYCL kernels would have a unique kernel class.

d277b99

Trying to avoid confusion at runtime about which kernel is which.

Make sorting work with oneDPL-2022.6.0.

2764344

Shorten the namespace name, to stop freaking out clang-format.

948f49a

Update to oneDPL 2022.7.1.

bd1ff94

Made track finding work with the SYCL OpenCL backend!

0fe1acb

Introduced traccc::sycl::test_queue.

b174baa

Just as a convenience method for seeing in the SYCL unit tests which exact device is used.

Only running the CKF test on NVIDIA and AMD backends.

65fd176

Taught traccc::sycl::test_queue how to figure out what sort of a queue it is. So that the CKF test could be skipped on OpenCL and Level-0 backends as long as those are still not working.

krasznaa force-pushed the SYCLTrackFinding-main-20241114 branch from 8155284 to 65fd176 Compare January 7, 2025 10:35

This was referenced Jan 7, 2025

Measurement Comparison Fix, main branch (2025.01.07.) #807

Merged

Track Finding Cleanup, main branch (2025.01.07.) #808

Merged

Simulation Tweaks, main branch (2025.01.08.) #809

Merged

SYCL Track Finding, main branch (2025.01.09.) #811

Merged

krasznaa closed this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL Track Finding, main branch (2024.11.14.) #773

SYCL Track Finding, main branch (2024.11.14.) #773

krasznaa commented Nov 15, 2024

stephenswat Nov 15, 2024

krasznaa Nov 15, 2024

stephenswat Nov 15, 2024

stephenswat Nov 15, 2024 •

edited

Loading

krasznaa Nov 15, 2024

stephenswat Nov 15, 2024

krasznaa Nov 15, 2024

stephenswat Nov 15, 2024 •

edited

Loading

stephenswat Nov 15, 2024

krasznaa Nov 15, 2024

krasznaa Nov 15, 2024

krasznaa commented Nov 16, 2024

krasznaa commented Nov 18, 2024

krasznaa commented Nov 20, 2024

krasznaa commented Dec 11, 2024

krasznaa commented Dec 11, 2024

sonarqubecloud bot commented Jan 7, 2025

krasznaa commented Jan 9, 2025

SYCL Track Finding, main branch (2024.11.14.) #773

SYCL Track Finding, main branch (2024.11.14.) #773

Conversation

krasznaa commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenswat Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenswat Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krasznaa commented Nov 16, 2024

krasznaa commented Nov 18, 2024

krasznaa commented Nov 20, 2024

krasznaa commented Dec 11, 2024

krasznaa commented Dec 11, 2024

sonarqubecloud bot commented Jan 7, 2025

Quality Gate passed

krasznaa commented Jan 9, 2025

stephenswat Nov 15, 2024 •

edited

Loading

stephenswat Nov 15, 2024 •

edited

Loading